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Abstract 

We consider the problem of determining the mixed quantum state of a large 
but finite number of identically prepared quantum systems from data obtained 
in a sequence of ideal (von Neumann) measurements, each performed on an 
individual copy of the system. In contrast to previous approaches, we do not 
average over the possible unknown states but work out a "typical" probability 
distribution on the set of states, as implied by the experimental data. As a 
consequence, any measure of knowledge about the unknown state and thus 
any notion of "best strategy" (i.e. the choice of observables to be measured, 
and the number of times they are measured) depend on the unknown state. 
By learning from previously obtained data, the experimentalist re-adjusts the 
observable to be measured in the next step, eventually approaching an optimal 
strategy. 

We consider two measures of knowledge and exhibit all "best" strategies 
for the case of a two-dimensional Hilbert space. Finally, we discuss some 
features of the problem in higher dimensions and in the infinite dimensional 
case. 
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1 Introduction 



The topic of this paper is the problem of determining a quantum state from mea- 
surement data. We consider a quantum system described by a Hilbert space H. of 
(finite) dimension d. Given a large but finite number of copies of the system, all 
prepared in the same quantum state r, we shall be allowed to perform an arbitrary 
(ideal) measurement in each copy. What knowledge about the state r do we have af- 
ter these measurements, and what is the best strategy to maximize the information 
gained? Several authors have considered problems of this type. Their approaches 
differ in some respects, in particular regarding the measurement strategy and the 
way how the knowledge about the unknown state r is quantified. 

The strategy analyzed by Wootters and Fields [1] consists of choosing - once and 
for all - a family of d + 1 observables, and measuring each one in a separate copy 
of the system an equal number of times. The knowledge gained in these measure- 
ments is quantified in terms of the average (over all possible unknown states) of an 
appropriately defined "uncertainty volume" in the set of states (which essentially 
stems from the Shannon [2] information measure). They arrive at the result that 
the average gain of knowledge in such a procedure is maximal if the d + 1 observ- 
ables measured are mutually unbiased (complementary). This optimal strategy is, 
by definition, independent of the actual (unknown) state r. Their paper is some- 
times referred to as proving that the use of mutually unbiased observables is the 
most efficient determination of an unknown quantum state by means of successive 
measurements. 

Peres and Wootters [3] conjectured that an appropriately designed single com- 
bined measurement on a number of identically prepared copies of a quantum system 
is more efficient than a sequence of measurements on the individual systems (a se- 
quential measurement). Moreover, they provided evidence that generalized (POVM 
based, see Refs. [4] [5] [6]) measurements are more effective than ideal measurements 
of the von Neumann type [7] . Their measure of knowledge is based on the Shannon 
information measure as well. 

In a special scenario, Massar and Popescu [8] showed that a combined measure- 
ment is more efficient than a sequential one, thus proving ("not in its letter, but in 
its spirit") Peres and Wootters' conjecture. In their work, knowledge is measured 
by a "score" function defined as the average (over all possible unknown states) of an 
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expression quantifying the difference between a candidate state and the unknown 
one. 

In apparent contradiction to these results, Brody and Meister [9] showed that 
the minimum Bayesian decision cost - taking properly into account what is known d 
priori about the unknown state - is the same for sequential as for combined measure- 
ments. They pointed out that the optimal strategy in determining a quantum state 
depends on the details of the approach, in particular on how the a priori knowledge 
is treated. 

In order to help clarifying these issues, we present a further approach to the state 
determination problem. Thereby, we focus on the original scenario of a sequence 
of ideal (von Neumann) measurements on individual copies of the system. We first 
compute a "typical" probability distribution on the set S of states achieved after a 
(large) number of measurements, thereby retaining the dependence on the unknown 
state r throughout the analysis. In other words: we will not perform an average 
over all possible unknown states. Thus, any measure of knowledge (of which we 
discuss two variants, one being related to the "uncertainty volume" as considered by 
Wootters and Fields) depends on r, and so do the "best strategies". After having 
arrived at (two variants of) a general variational principle determining what is a 
best strategy, we solve the problem of finding these strategies in very detail in two 
dimensions (d = 2), and discuss some features of the problem in higher dimensions. 

In contrast to the scenario considered by Wootters and Fields, our experimental- 
ist learns from previously obtained data and uses them to re-adjust the observable 
measured in the next step, eventually approaching the best strategy for the unknown 
state r. We show that, in dimensions larger than 2, the best strategy is sometimes 
not provided by a family of mutually unbiased observables. 

We conclude the paper by giving some remarks on the infinite dimensional case. 
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2 Derivation of the probability density 



Let us begin introducing some notation. The spectral decomposition of any observ- 
able (hermitean linear operator) reads 

A= ]T aP a , (2.1) 

a e Sp(A) 

where Sp(A) denotes the spectrum (set of eigenvalues) of A, and P a are the (unique) 
hermitean projections onto the respective eigenspaces (the spectral projections) sat- 
isfying P a Pb = for a b, and summing up to the identity operator: J2aeSp(A) Pa = 
1. In a given state (density matrix) p, the probability to obtain the outcome 
a G Sp(A) in a measurement of A is given by 

w a (p,A) = (P a ) p = Tr(pP a ), (2.2) 

the symbol ( ) p denoting the expectation value in the state p. 

Now suppose we are given n copies of a quantum system, prepared to be in the 
same - unknown - quantum state r, and we are allowed to perform a sequence 
of measurements of n observables (Ax, A 2 , . . . A n ), each on one copy of the system. 
This setting guarantees that the outcomes, collectively denoted as 

A = (ai,a 2 , ■ ■ .a n ), (2.3) 

are statistically independent of each other. Given these data - what can we say 
about the state? This is a case for an application of Bayes' Theorem of elemen- 
tary probability theory: Given a domain D in the space S of states, we ask for a 
probability that the measurement outcomes A arise from a state contained in D. In 
other words, we ask for a probability distribution describing the likelihood of p to be 
responsible for the experimental data. This requires the assumption of an d priori 
likelihood, i.e. a probability measure on S. A natural candidate is the measure T>p 
induced by the Hilbert-Schmidt geometry - see (2.21) below -, but in order to be 
open for different choices, we include an additional density p(p). We will see that 
things do not depend heavily on this quantity. Whatever choice is made, Bayes' 
Theorem tells us that the desired probability distribution on the space of states is 
given by 

n I n \ 

Pa(p) = Cp(p) J] w aj (p,Aj) = Cp(p) exp ]T hi w aj (p, A,) , (2.4) 

3=1 \j=l J 
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where the constant C is chosen such that 

/ Vp p A (p) = 1. (2.5) 
Js 

This is the starting point for our analysis. 

The probability density (2.4) is defined for any experimental record A consisting 
of all measurement outcomes (2.3). For small n, the statistical fluctuations in the 
data lead to a strong dependence of Pa{p) on A. When the number of measurements 
is increased in an appropriate way, the fluctuations get suppressed to any desired 
degree. The statistical error in the exponent of (2.4) will, roughly estimated, be of 
the order n~ 1//2 times the order of the exponent itself. 

A particularly simple setup in which the statistical fluctuations may be sup- 
pressed in a controlled way this is to choose a smaller set of mutually different 
observables (Bi, B 2 , . . . B m ), m <C n, and repeat each of them sufficiently often. In 
other words, the sequence (A±, A 2 . . . A n ) is chosen to be of the form 

(Bi, . . . Bi, B 2 . . . B 2 , . . . B m . . . B m ). (2.6) 

If Bp is measured rip times (Z)/?Li n p — n )i the number of measurements may be 
scaled up uniformly by simply replacing rip — > k rip for sufficiently large k, while m 
is kept constant. 

Before coming to the main part of our derivation, let us describe the underlying 
idea. We assume that sufficiently many different observables have been chosen (de- 
tails to be specified below), and for the moment we ignore p(p) from (2.4). Once 
it is guaranteed that the statistical fluctuations are small, most experimental data 
(2.3) will render (2.4) very close to a family of "typical" probability distributions. 
For large n, a typical p\(p) may well be approximated by a Gaussian, peaked around 
some density matrix p^. The latter represents the "best guess" for the unknown 
state, i.e. for r. Some general properties of p\{p) rnay be inferred from the fact 
that the exponent in (2.4) is a sum of n statistically independent quantities: As n 
increases, the typical error made by estimating the unknown state to be pa scales 
as rT 1 ! 2 . However, the quadratic form defining the "shape" of the Gaussian only 
depends - to leading order - on r and on the sequence of observables chosen, i.e. 
it is approximately the same for all data that may reasonably occur. Hence, the 
distributions p\(p) may be viewed as translated versions of each other. In order to 
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have a manageable quantity at hand, we pick out the "average" distribution p(p), 
defined by replacing the exponent in (2.4) by its expectation value (with respect to 
r). As we shall work out below in detail, it is peaked around r. An experimentalist 
having performed n measurements and having inserted the data (2.3) into (2.4) will 
thus obtain a result very close to p\(p) = p(p — Pa + t), with p^ = T + 0(n~ l l 2 ) being 
the best guess for the unknown state. The important point is now that the measure 
of knowledge (or uncertainty) depends only on the "shape" of the Gaussian (i.e. on 
the quadratic form in the exponent), but not on the location p\ of its center. (The 
typical pa occurring in different runs of the experiment are distributed according to 
p(p)). It is in this sense that p(p) is "typical" and is asymptotically approached by 
Pa(p) as n — > oo. 

We begin our derivation by considering the exponent in (2.4) as a function of 
the data ai,...a n from (2.3). The probability distribution relevant for any a,j is 
w a (r,Aj). Hence, we define 

p(p) =CV(p)exp^£i2 i ( A oj, (2.7) 

where 

Rj(p) = E w a (T,Aj)\nw a (p,Aj), (2.8) 

aeSp(Aj) 

and C is a normalization constant close to C. Next, we define quantities £j a {p) by 

w a {p, Aj) = e ja {p) + w a (r, Aj) (2.9) 

and write 

R^p) = H j + S^p), (2.10) 

where 

Hj = E Wai^A^lnWa^Aj) (2.11) 

aeSp(Aj) 

is the negative of the Shannon information measure of the probability distribution 
a i— > w a (r, Aj) - it may be absorbed into the constant C -, and 

sAp)= E w °-( T > a j) ln i 1 + ~fi~jnd ( 2 - 12 ) 
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is - according to (2.10) - the negative of the relative entropy Sj(r\p) of a ^ w a (r, Aj) 
to a i— > w a (p, Aj). Assuming £j a (p) to be small (p being close to r), we can expand 



and the definition (2.9). The last term is somewhat delicate. It may be neglected 
if its denominator is non-zero and the number of measurements is sufficiently large. 
Hence, we would like to have w a (r, Aj) ^ Va G Sp(A,) and V j = 1, . . . n. The 
simplest way to achieve this is to require 



for any observable A. With (2.2), this is equivalent to Tr(rP) ^ for any (non-zero) 
hermitean projection P, which just states that r is invertible, i.e. all eigenvalues of 
r being non-zero. In finite dimensions, this is not a very drastic condition on the 
unknown state: It just states that r lies in the interior of the set S of states. From 
now on, we shall assume this to be the case. Thus, omitting the last term in (2.13) 
may be compensated by a correction factor of the order 



or even closer to 1, || || denoting the operator norm ( \\A\\ = max aS s p (A) H )• If may 
therefore be neglected if p is sufficiently close to r. We will show below that this 
will be the case in the region of interest. 

Upon omitting the last term in (2.13) and re-inserting £j a {p) from (2.9), we arrive 
at the result that - for invertible r and after sufficiently many measurements - the 
desired probability density is given by 




w a (r,A)^0 VaeSp(A) 



(2.14) 




(2.15) 




(2.16) 



where 




E 




(2.17) 



aGSp(A) 



w a {r, A) 
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With P a denoting the spectral projections of A, this may also be written as 

aeSp(A) J-H'^aJ aeSp(A) \ o/t 

a quantity which plays a key role in what follows. The constant K in (2.16), col- 
lecting C and the p-independent contribution (2.11), is chosen such that 



Jvpp(p) 



(2.19) 



The sum over the Q's in the exponent of (2.16) defines a quadratic form on S. It 
may be written as 

M(p) = f2Q(p,Aj) = (p- T \M\p-r), (2.20) 

where M, is a linear operator acting on £> , the (real) vector space of hermitean linear 
operators with zero trace. Here (•••!•••) denotes the Hilbert-Schmidt inner product 

(£|t?) = 2Tr(^) (2.21) 

for arbitrary linear operators £ and r], which induces a (real) inner product on 
Bq. (The factor 2 is just for convenience. It ensures that for d = 2 the matrices 
ay/2 form an orthonormal basis of £> ). With respect to (2.21), the operator M. is 
symmetric. We assume that there are enough independent observables among the Aj 
so as to make M. invertible. (In fact, the overall set of all spectral projections {Pj a } 
must span the complete <i 2 -dimensional space of hermitean linear operators). Hence, 
M(p) is a non-degenerate quadratic form, the exponential part in (2.16) being a 
distribution of Gaussian type peaked around r. When the number of measurements 
is increased, the peak becomes arbitrarily sharp, eventually coming to lie well inside 
the domain in which (2.15) may be replaced by 1. To see this in more detail, we 
consider a "typical" p, whose "distance" from r corresponds to the RMS (root mean 
square) deviation of the Gaussian 

II Ptypicai - r || 2 < Tr ((p typical - r) 2 ) » TiiM- 1 ) . (2.22) 

(For the second step, cf. (3.3) below). For large n, Tr(A^ _1 ) becomes proportional 
to n~ l . Hence, n may be chosen large enough so as to make (2.15) arbitrarily close 
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to 1 for any "typical" p. With increasing n, the approximation becomes arbitrarily 
accurate. 

In case of choosing m observables Bi, . . . B m according to the scheme (2.6), and 
performing rip measurements of Bp, (2.18) and (2.20) combine into 



In order to render Ai invertible, we must have m > d+1. The lower bound m = d+1 
may only be attained if the overall set of all spectral projections {Ppb} spans the 
(real) vector space of hermitean linear operators, which implies that each Bp has 
only non-degenerate eigenvalues. This formula will turn out particularly useful later 
on. 

Now we have to say some words about the a priori probability distribution p 
contained in (2.16). We mainly focus on situations where nothing - or very few - is 
known about r before the measurements are carried out. One would then choose p 
to be spread over the whole of S. Consequently, the functional dependence of p(p) 
is dominated by the peak of the Gaussian. In particular, if p is continuous at p = r, 
p(p) may effectively be replaced by p{r) for large n. Hence, it is justified to ignore 
this factor, and we will set p(p) = 1 for the rest of this paper. 

Finally, the region of integration in (2.19) may effectively be replaced by the set 
of hermitean linear operators with trace unity, which is isomorphic to JR d -1 . Thus 
we end up with the standard normalized Gaussian 



where Ai is the linear operator M. as defined in (2.20) or in the more convenient 
form (2.23). This operator - depending only on r and on the sequence of observables 
- is thus the key object allowing us to quantify the gain of knowledge in terms of 
a single numerical measure. We recall that, when the experimentalist inserts the 
measurement outcome data (2.3) into (2.4), he will obtain a probability distribution 
very close to a translated version of p(p), i.e. 



m 



M(p) = QiP,Bp) = (p-r\M\p-r). 



(2.23) 



13=1 




(2.24) 




(2.25) 



where p\ differs from r by 0(n 1 ^ 2 ). 
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3 Measures of knowledge and best strategies in 
general 

The distribution (2.24) is determined by the quadratic form (2.20) or (2.23), i.e. by 
the linear operator M. on the (real) vector space of hermitean linear operators with 
trace 0. As described above, M. contains all the information necessary to work out 
the experimentalist's knowledge (or uncertainty) about the unknown state, once he 
knows the data. The only freedom that is left for him is to choose the sequence of 
observables Aj. However, prior to searching a strategy (i.e. a choice of observables) 
that maximizes this knowledge, we first have to specify how the "knowledge about 
the unknown state" - or, conversely, the "uncertainty about the unknown state" - is 
quantified. The answer depends on which feature of the unknown state is required. 
We consider two possible approaches: 

a. ) Volume in S: 

The peak of the Gaussian (2.24) occupies a "volume" in the set S of states of the 
order 

V = (detM)" 1/2 , (3.1) 

which may be considered as a measure of uncertainty about r. This is not identical 
with, but plays a similar role as Wootters and Fields' "uncertainty volume" [1], 
before the average over the possible unknown states is performed. It corresponds to 
the information theoretic notion of knowledge since it is related monotonously to 
the negative of the Shannon information measure 

H = j s VpM Mp) = -_+-,„ i^-^ j . (3.2) 

A best strategy based on this measure (a best "volume oriented strategy") is one 
for which det.M is maximal for given n. 

b. ) Distance from r: 

The RMS (root mean square) deviation of the distribution (2.24) is given by 

D 2 = (Ap) 2 ee / Vp p(p) Tr ((p - r) 2 ) = T^AT 1 ) . (3.3) 

It represents the uncertainty about the unknown state as measured in terms of 
the mean "distance squared" Tr((p — r) 2 ) in the space S of states and defines a 
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"length" scale D. A best strategy based on this measure (a best "distance oriented 
strategy") is one for which Tr (.M -1 ) is minimal for given n. 

It is easy to see that any best strategy based on maximizing det.M or minimiz- 
ing Tr (.M" 1 ) necessarily has to use observables Aj with non-degenerate eigenvalues 
only. At the level of our formalism, this feature may be traced back to the properties 
of the quadratic form M(p), as given by (2.20) or (2.23), and its constituents Q(p, A) 
as defined in (2.17) and (2.18): We first note that M(p) is a sum, each term stem- 
ming from a particular measurement. Q(p, A) may thus be considered as a measure 
of how our knowledge increases (on the average) by a measurement of A. M(p) 
has the important property that the contribution of an observable A will be the 
larger, the more spectral projections A possesses: Let A be one of our observables 
measured, and suppose it possesses a degenerate eigenvalue a. The corresponding 
eigenspace (the image of the spectral projection P a ) thus has dimension greater 
than 1. Suppose now that the measurement of A is replaced by the measurement 
of another observable A', constructed from A by replacing aP a — > a' 'P a > + a"P a " in 
the spectral decomposition of A (where a' ^ a", both numbers being different from 
the other eigenvalues of A, and P a >, P a » being orthogonal projections dividing the 
eigenspace into a direct sum: P a = P a i + P »). We can consider A' as a "refinement" 
of A. Now we compare the two corresponding quantities M{p) and M'(p). Explicit 
computation reveals 

M'(p)-M(p) = Q(p,A')-Q(p,A) = 

Tr(rP a/ ) Tr((p - r)P a „) - Tr(rP a „) Tr((p - r)P a ) 

Tr(rP a Tr(rP a „) Tr(r(P a , + P a „)) 

which represents a semi-positive quadratic form by its own. Consequently, we have 
detM' > detM and Tr(.M /_1 ) < Tr(A^ _1 ), while the total number n of measure- 
ments has not been changed. The same procedure may be repeated until all degen- 
erate eigenvalues of all observables Aj have disappeared. (The same behaviour is 
expected for any other reasonable measure of knowledge). 

By construction, the best strategies depend on the unknown state r. Hence, 
one may object that when r is unknown, the experimentalist does not know how 
to choose his observables. On the other hand, when inserting the outcomes of a 
relatively small number of measurements of arbitrary observables into (2.4), one 
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obtains a first rough estimate of r. Next, one chooses observables according to a 
best strategy as if the estimate in fact coincides with the unknown state. After 
some runs of this type (or even after each measurement) one determines a better 
estimate for r and re-adjusts the observables. This procedure is iterated and will, 
for increasing n, approach the effectiveness of a best strategy. 



4 The two-dimensional case 

Let us now study the case d — 2 in some detail. Since the states and observables on 
a two-dimensional Hilbert space admit a simple geometric representation, we can be 
more explicit than in the case of general d. The set of all density matrices may be 
parametrized as 

1 

p(a) = -(l + ad) with \a\ < 1, (4.1) 

where o represents three observables obeying the Pauli spin matrix algebra, and 
a G 1R 3 . Pure states are characterized by \a\ = 1, the tracial state is given by 
a = 0. The space S of states is thus represented by the unit ball in IR 3 . The natural 
measure Dp on S is the Euclidean volume element d 3 a. 

Now let us look at observables. Any hermitean linear operator may be written 
as al + co with a 6 E and c G 1R 3 . Leaving aside multiples of the identity and 
irrelevant multiplicative factors, we confine ourselves to measuring observables of 
the type 

B{c) = co with |c| = 1. (4.2) 

The spectrum of any such operator is { — 1, 1}. The spectral projection corresponding 
to the eigenvalue b G { — 1,1} of B{c) takes the convenient form P& = |(1 + bco), 
and the measurement outcome probabilities for this observable in the state p{a) read 

w b (p{d),B{c)) = l -{l + bac). (4.3) 

We now specify our sequence of observables according to the scheme (2.6): We choose 
m unit vectors cp {(5 — 1, . . . m) and perform rip measurements of each Bp = B(cp). 
The total number of measurements is therefore n = Y^3=\ n p- The unknown state 
shall be represented the parameter value u, i.e. 

T = p(u), (4.4) 
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and p(p) is written as p(a). Using (2.23), a short computation reveals that the 
probability distribution (2.24) is given by 



P(a) = exp(^-^(d-u) T M(d-u)Y (4.5) 

where M. is the 3x3 matrix with components 

m n 

M r s = J2 i A~ \2 C PrCf3s, (4.6) 

p% l-{ucp) 2 

Cj3 r being the components of the vector cp, with r and s ranging from 1 to 3. Starting 
with this expression, we will now tackle the problem of finding all best strategies for 
the determination of the unknown state p{u). 



5 Best strategies for d = 2 

The form of (4.6) shows that we must have m > 3, and the sequence of vectors 
cp must contain three linearly independent elements (otherwise M. would not be 
invertible). In other words, the m x 3 matrix defined by the components cp r must 
have rank 3. According to the two measures of knowledge as discussed in section 3, 
we consider the two cases of maximizing det.M and minimizing Tr(7W _1 ). 

a.) Maximizing det^M: 

We first consider the volume oriented approach, i.e. the case when the "volume" V in 
S occupied by the peak of the Gaussian (4.5) serves as a measure for the uncertainty 
about the unknown state. Let us fix n (and, for the moment, m) and ask for 
which configurations (np, cp) the determinant of Ai is maximal under the subsidiary 
conditions Y^p=i n p — n an d \cp\ = 1 V/3. Introducing Lagrange multipliers c, Cp, 
the corresponding unconstrained problem is to maximize 

m i m 

T = \ndetM -c^np-- £ ^ c| (5.1) 
p=i 1 p=i 

with respect to the variables (np,cp,c,Cp). The logarithm is used just for conve- 
nience: This form allows us to apply the general formula (9(ln det*M) = r Ti{M.^ 1 dM), 
where d stands for any derivative d/dcp r or d/dnp. Now we choose the coordinates 



13 



in IR 3 such that M. is diagonal in the maximizing configuration. This choice is pos- 
sible because M. is a hermitean matrix, and it causes all non-diagonal elements to 
drop out of the problem. Differentiation with respect to rip and cp r leads to the set 
of equations 



1 3 r 2 

^ C/3s 



i - (uc,y § m„ ~ c (5 ' 2) 

1 2npcp r 2n l3 (uc f3 )u r 3 cg s 

2^ ~rr~ = Lpcpr, (5.3) 



whose combination yields 

2 np 



{ucpf 



(~M~ + ^^) MrC ) = C P C Pr- ( 5 - 4 ) 



Multiplying this equation by cp r , summing over r and using (5.2) and \cp\ = 1 V/3 
gives 

C = -^—. (5.5) 

1 - [UCpY 

Multiplying (5.2) by np, summing over (5 and using (4.6) and Y?p=inp = n leads to 

c= - . (5.6) 
n 

Upon inserting these last two expressions into (5.4), we find 

/ 1 3 \ 3 . . . 

C/Jr ^_--j + -(ucp)u r = 0. (5.7) 

Since the mx3 matrix defined by Cp r has rank 3 - as argued at the beginning of this 
section -, the term (.M^.)" 1 — 3 n~ x must vanish for at least two values of r (otherwise 
one could divide by these terms for two or three values of r and conclude that cp r 
has rank less than 3). We may choose the coordinates of IR 3 such that these values 
are r = 1 and 2. Hence, JHu = M.22 = \ n , which implies U\ = u 2 = 0, ucp = u 3 cp 3 
and u\ = u 2 . Equation (5.7) thus shrinks to the statement that M33 = \n(l— w 2 )" 1 , 
and the remaining equation (5.2) is automatically satisfied. In this way we arrive at 
the following 
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Lemma 1: 

For given u and n, the configuration (m, np, cp) maximizes detM. under the sub- 
sidiary conditions Y^p=\ n p — n an d \cp\ = 1 V/3 if and only if 

(i) u is an eigenvector of M. associated with the eigenvalue \n{l — w 2 ) -1 , and 

(ii) the two other eigenvalues of M. are both equal to |n. 

The second statement implies that M. acts proportional to the identity in the sub- 
space orthogonal to u. The value of det.M in the maximizing configuration is given 
by 

/n \ 3 1 

(detM) max = T -^, (5.8) 

or, expressed in terms of the "volume" V = (det-M) -1 / 2 occupied by the peak of the 
Gaussian, 

/ Q \ 3/2 

Vmin = y/T=W. (5.9) 



Any configuration satisfying (i) and (ii) represents a "best strategy", and all 
these strategies work equally well, because (5.8) depends only on n and u, but not 
on any details of the configuration (m,np,cp). The simplest strategy is to choose 
m = 3 and let {ci,c 2 ,c 3 } be an orthonormal basis of H 3 such that one of these 
vectors (c^, say) is parallel to u. In this strategy, we must have ri\ = n 2 = n% = |n, 
i.e. all three observables Bp = B(cp) are measured equally often. 

b.) Minimizing Ti(M _1 ): 

The distance oriented approach, i.e. the case when the mean "distance squared" 
from the center of the Gaussian (4.5) serves as a measure for the uncertainty about 
the unknown state, is treated similarly. Formally, the problem consists of minimizing 

m i m 

T = Tr(M- 1 )+cJ2np + -Y,Cfi?p, (5.10) 
p=i z p=i 

where c and Cp are Lagrange multipliers. We again choose the coordinates in H 3 
such that M. is diagonal in the minimizing configuration and use the general for- 
mula d(Tr(M)~ 1 ) = — Tr(M~ 1 (dM)M~ 1 ), where d stands for d/dcp r and d/dnp. 
Differentiation yields a set of equations that look like (5.2)-(5.3), except that the 
diagonal elements M. TT and M. ss are replaced by their squares, and the same applies 



15 



to the analogue of (5.4). Equation (5.5) appears without change, but the analogue 
of (5.6) now takes the form 

due to an additional appearance of (.M ss ) _1 in the analogue of (5.2). Hence, the 
analogue of (5.7) becomes 

C/3r ^ ~J^[2 ~ ^ + C ("^) U r = (5.12) 

with c from (5.11). Following the same logic as before, the term (Ai rr )~ 2 — c must 
vanish for at least two values of r (which we choose to be 1 and 2). This implies 
ui = u 2 = and 

wrm =c A?i = c(1 - a2) - (5 - 13) 

Combining these equations with (5.11), we may easily compute the diagonal elements 
A4 rr , i.e. the eigenvalues of M. (to be displayed below). The remaining equation - 
the analogue of (5.2) - is then automatically satisfied. Our result thus reads: 

Lemma 2: 

For given u and n, the configuration (m,np, cp) minimizes Tr(.M _1 ) under the sub- 
sidiary conditions Y7p=\np = n and \cp\ = 1 V/3 if and only if 
(i) u is an eigenvector of M associated with the eigenvalue 



n 



(2 + v / r 3 ^)v / r 3 ^' 

and 

(ii) the two other eigenvalues of M. are both equal to 

n 



(5.14) 



2 + 



u 2 



(5.15) 



The second statement implies that M. acts proportional to the identity in the sub- 
space orthogonal to u. The value of Tr(.M -1 ) in the minimizing configuration is 
given by 



T^AOmin = " ( 2 + VT^Y . (5.16) 



D 2 ■ 

n 
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Any configuration satisfying (i) and (ii) represents a "best strategy", and all 
these strategies work equally well. The simplest one is to choose m = 3 and let 
{ci, c*2, c 3 } be an orthonormal basis of H 3 such that one of these vectors (03, say) is 
parallel to u. The numbers of measurements performed of any of the three observ- 
ables Bp = B(cp) must now be chosen as 



ni = n 2 = — — n 3 = — — , (5.17) 



u 2 



and they correctly sum up to n. The observable aligned with u thus needs less 
measurements than the others. 

Comparing a.) and b.): 

The knowledge about the unknown state after n optimally chosen measurements is 
given by the volume (5.9) and the length squared (5.16), respectively. For small \u\, 
these two methods work roughly equally well. In both cases, the three eigenvalues 
of M. are approximately of the same order, the spread of the Gaussian thus being 
roughly the same in all directions in S. If, however, \u\ is close to 1 (i.e. r being 
almost pure), one eigenvalue of M. becomes large in both cases, thus causing the 
peak to be spread only very little in the direction of u. In this situation the "volume" 
oriented approach is more efficient: In the limit \u\ — > 1 for fixed n we have V m j n — > 0, 
whereas -D^m — * ^ n l - 

In both cases, the strategy works as follows: When inserting the outcomes of 
a relatively small number of measurements of arbitrary observables into (2.4), one 
obtains a first rough estimate of r, i.e. of u. Next one chooses an orthonormal basis 
{ci, c?2, C3} of IR 3 such that C3 is parallel to the best guess of u. One then measures 
the three corresponding observables B{cp) (the relative number of measurements 
depending on whether V or D 2 represents the measure of uncertainty). After some 
runs of this type (or even after each measurement) one determines a better guess of 
u and re-adjust the three vectors accordingly (c 3 new being aligned with the new guess 
of u, and cj icw and c 2 ncw being as close to C\ and c 2 as possible). This procedure 
is iterated and will, for increasing n, converge to an orthonormal basis representing 
a "best strategy" as determined above. In other words: for sufficiently large n, we 
expect the bounds (5.9) or (5.16), respectively, to be approached arbitrarily well. 
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6 Comparison of strategies in higher dimensions 



In this section we consider the case of higher dimensional Hilbert spaces. After 
presenting a generally applicable method to improve strategies, we show how con- 
crete strategies may be constructed. It turns out that a strategy based on mutually 
unbiased (complementary) observables is not always optimal. Concluding, we give 
some remarks about the infinite dimensional case. 

General formalism 

We now turn to higher dimensions. Let the dimension d of 7i be arbitrary. By B, 
we denote the (complex) vector space of all linear operators on H, endowed with 
the Hilbert-Schmidt inner product (2.21). The latter makes B a cP-dimensional 
Hilbert space by its own. We will use a bra-ket-notation for this space, using round 
brackets, i.e. \^){r]\ representing the linear operator B — > B sending ( hh> |£)(?y|C) 
or, equivalently, ( i— > 2 Tr(j]%) £. The determinant and trace of linear operators 
B — > B will be denoted by the symbols det and Tr , respectively. Furthermore, 
we need a component formalism for operators of this type. If {ei\I — 1, ... d} is an 
orthonormal basis of TC, the linear operators ("matrix units") 

ei j = \ej){ej\ : H^H (6.1) 

form a basis of B, satisfying (e/j|e^i) = 25ik$jl- Along with the expansion of 
elements £ G B as 

£ = E&J e " = El e />£"< e j| with 6j = (e/|£|ej), (6.2) 
i,j i, j 

any linear operator A : B — > B may be written as 

A=\ E l e /j)^/J,^i( e ^i| witn ■Au,KL = \{eu\A\e KL ) . (6.3) 

In terms of these components, the action of A is represented by a matrix multipli- 
cation. When understanding the values of the double index IJ by a single index 
r, the components Aij^kl explicitly define a d 2 xd 2 matrix representation A rs , in 
which the determinant and the trace take their usual form. If A = \£)(r)\, we have 

A U ,KL = niJVKL- (6-4) 
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The orthogonal projection onto the normalized element (2c?) 1 I 2 1 

V = 1 |l)(l| : B->B (6.5) 
(1 denoting the unit operator on Ti) has components Vij } kl = d^&u $kl- 

By £>o, we denote the subset of B consisting of all hermitean linear operators with 
zero trace. It is a real vector space of dimension d 2 — 1, and the Hilbert-Schmidt 
inner product for any pair of its elements is real. The determinant and trace of 
linear operators B — > £>o are denoted by the symbols det and Tr, respectively. 

We now consider a strategy based on the scheme (2.6), i.e. a collection Bp 
(P = 1, . . . m) of operators, such that each Bp is measured np times in a copy of 
the system, and YIp=i np — n- We assume np 3> 1 for each (3. (As noted above, 
sufficiently large n may be achieved by replacing rip — * krip for sufficiently large k, 
while keeping m constant.) The key object describing the quality of the strategy is 
the symmetric linear operator M. : Bq — > £>o as defined in (2.23) and appearing in 
the Gaussian (2.24). As may be read off from (2.23) and (2.18), any observable A 
provides a contribution 

= \ E (e-e) 

where P a is the spectral projection of A with respect to the eigenvalue a. However, 
when written in the above form, any such object is a hermitean linear operator 
Q(A) : B — > B that does not leave B invariant. Its components are given by 

Qu,kl(A)= E Pa Mpi KL ^ ( 6 - 7 ) 

where P a ,u are the components of P a . From now on, we assume the orthornormal 
basis {e/} to consist of eigenvectors of r. As a consequence, the matrix t/j is 
diagonal, and the denominator in (6.7) is (t|P ) = 2 J2i T nPa,ii- 

When summing up (6.6) for the observables Bp, we arrive at a hermitean operator 
acting on B. It will turn out convenient to generalize it to a family of operators 
M Q (a) : B — > B, defined as 

m 

M Q (a) = E npQ(Bp) + aV, (6.8) 

0=1 
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where V is given by (6.5). Since (1|£) = 2Tr(£) = for any traceless £, we have 
(£\M-o(oi)\r)) = (£\Ai\r)) for all (,?) e Bo- This establishes the relation between 
M.®(a) and the original object M. : B — > B . 

We will now express our two measures of knowledge, (3.1) and (3.3), in terms of 
M. & {a). Since V is the (one-dimensional) hermitean projection onto the orthogonal 
complement of £> , (6.8) tells us that 

det & A4 Q (a) = adetAi + terms independent of a. (6.9) 

From this it follows 

detM = lim det ^^ , 
and analogously we conclude 

T^AT 1 ) = Jim Tr (M Q (a)- 1 ) . 
These two quantities is all we need in order to compare strategies 

Improving strategies 

We now return to the problem of optimizing measurement strategies. Given some 
particular strategy characterized by A4 Q (a), we show how to construct another 
strategy which is at least as good as the original one. 

For any observable Bp, we consider the family Bg((p) = e tipT Bp e~ tipT . We may 
think of unitarily "rotating" Bp within B in such a way that the unknown state r 
is invariant. When selecting an arbitrary value of </?, and replacing the observables 
Bp by Bp(<f), we obtain a strategy that is obviously equivalent to the original one. 
Denoting its associated family of .M-operators by M' Q (a,(p), we have 

det Q M' Q (a,if) = det Q M Q (a), (6.12) 
Tr (A^cw)- 1 ) = Ti Q (M Q (a)- 1 ) (6.13) 

for all a. We will now construct a further strategy out of these equivalent ones: 
We distribute the np measurements originally reserved for Bp among members of 
the family Bp(ip). Technically, we introduce a probability distribution (p i— > f(ip) 
according to which a value for (p is thrown in order to determine the observable 
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(6.10) 
(6.11) 



Bg((p) to be measured next. In a first step we may think of / as a discrete distri- 
bution (admitting only particular values for ip). However, for sufficiently large np, 
this may arbitrarily well be approximated by allowing / to be a continuous distribu- 
tion. Hence, the average number of measurements carried out for observables Bp(tp) 
satisfying p < p < ip + dp will be npf(p )dp. 

It may of course happen that different observables Bp effectively play the same 
role in the new strategy. This will happen if they are already "rotated" versions of 
each other, e.g. if B2 = e t(fiT Bi e~ llfT and f(<p) 7^ for some ip. In this case, the new 
strategy is effectively generated by a smaller set of observables than contained in 
the original strategy (while the number of different observables actually measured 
will in general increase). 

By construction, the operator A4' Q (a) for the new strategy is given by the average 

M' e (a) = Jdpf(p)M' Q (a,p). (6.14) 

Since M 1— > 1 and M 1— > ln.M are operator convex functions, it follows from 
the Peierls-Bogoliubov inequality that 

det & M' Q {a) = exp (Tr (ln M' e (a)j) > det e M e (a), (6.15) 
Tr^M'^)- 1 ) < Trivia)- 1 ), (6.16) 

where have taken into account (6.12) and (6.13). These inequalities survive the 
limits (6.10) and (6.11), so that we conclude 

detM' > detM, (6.17) 
Tr^'- 1 ) < Tr^- 1 ). (6.18) 

With respect to the measures of knowledge in both the volume and the distance 
oriented approach, the new strategy is better than (or equally well as) the original 
one. 

Let us now compute the operator Ai' Q (a) for the new strategy more explic- 
itly. The spectral projection of B^p) with respect to the eigenvalue a is given by 
PpSv) = e ilpT Pp a e~ ilpT ■ Hence, (r\Pp a ((p)) = (r\Pp a ) for any p, so that nothing 
changes in the denominators in (6.6) and (6.7). Since the basis vectors ej are eigen- 
vectors of r, the components of the new spectral projections become Pp a jj(<p) = 
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(dlPpMlej) = e^( r "- T ^)(e 7 |P i9a |ej) = e^'-^^P^j. Thus, when computing 
the components -Mq/j^^o:), the integral over <p is to be taken over 

e if{TU-TjJ-T K K+TLL) ]^ 

In order to give it a simple form, we choose / such that the integral over these 
expressions is only non-zero if I = J and K = L or I = K and J = L. This gives 

jdipfitfe^"-™-™^ = Sjj SxL + $IK 3jL &IJKL 5 (6.20) 

where Stjkl = 1 if all four indices agree, and otherwise. Strictly speaking, this is 
only possible if the eigenvalues r u are sufficiently different from each other. If this 
is not the case, one may choose some appropriate hermitean operator £ commuting 
with r and redefine BJtp) = e^Bp e~ % ^ . Coosing the eigenvalues of £ to have only 
rational quotients, there is always a finite interval for the (^-integration such that 
(6.20) is valid with /(</?) = const. Otherwise one would have to use the invariant 
mean 

J dtp f(<p) . . . — > Km ^ f_ T dy . . . (6.21) 

With the choice (6.20), the transition from the old to the new strategy is simply 
achieved by 

Mo JJjKL (a) = (SuSkl + SikSjl- Sijkl) M Q ,u,KL(a) ■ (6.22) 

In effect, the average over equivalent strategies has cut off some of the original com- 
ponents, but has left the remaining ones (A^ Q)//jJ j(q;) and Mqjjjj(oc)) unchanged. 



Due to the blockform of (6.22), any of the operators M.q(o) leaves two sub- 
spaces of B invariant: W, the d- dimensional subspace spanned by the basis elements 
{eu\ I — 1, . . . d} (containing 1 and r), and its d(d — l)-dimensional orthogonal com- 
plement W" 1 , spanned by the basis elements {eu\I, J = 1, . . . d, I ^ J}. Thus, it 
uniquely decompose into the direct sum M.q(o) = 71(a) © S, where lZ(a) acts on 
W, and S acts on W" 1 . The components of these operators are 

n u (a) = M' eJIiJJ (a) VI, J (6.23) 

M' QjJJJ (a) for (I J) = (KL), I^J,K^L 

for (I J) ^ (KL), I^J,K^L { ' 



>IJ,KL 
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Since the indices of S are understood as pairs I J with (I ^ J), the array Su^kl forms 
a diagonal d(d — 1) x d(d — 1) matrix. As indicated, it is independent of a (because 
the operator V as defined in (6.5) acts as a projection in W and annihilates W" 1 ). 
Some algebra shows how our measures of knowledge may be expressed in terms of 
these objects: Let 

Ru = Ku(0) , (6.25) 

which is the d x d matrix made up be the 11 J J components of (6.8) when the 
aP-term is ignored, and 

E TJ = 1 W, J (6.26) 
reflecting the component structure of the aP-term in (6.8). Then 

detM' = i detR Tr^E) det S (6.27) 

MM'- 1 ) = Tr^- ^^^ +Tr^- 1 ). (6.28) 

When computing these two quantities one may use the fact that they are invari- 
ant under the replacement R R + cE for any constant c. The combination 
deti? x Tr(i? _1 E) may likewise be written as J2i,j(— ) /+J det/ji?, where det LJ R is 
the determinant of the matrix obtained from R by deleting the /-the row and the 
J-th column. (We recall from linear algebra that detuR = detR(R~ 1 )ji). It thus 
follows that det M. is a polynomial expression in Rjj. 



Comparison of efficiency for different states 

In order to compare the efficiency of an improved strategy for different states in the 
volume oriented approach, we note that, according to (6.6) and (6.8), M.(\ti + (1 — 
A)r 2 ) < \M(n) + (1 - X)M(t 2 ) for < A < 1. Hence, as in (6.15) and (6.16), 
the Peierls-Bogoliubov inequality guarantees that the strategy is more efficient for 
a state that is less mixed, i.e. 

det Q M' Q {a, Ar x + (1 - A)r 2 ) < A det© M' Q (a, Ti) 

+(1- \)det e M' e (a,T 2 ). (6.29) 

Later on, when discussing particular strategies, we will concentrate on "typical" 
states, i.e. the tracial state, which is maximally mixed, and states with some van- 
ishing eigenvalues. 
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Strategy 1: Using mutually unbiased observables 

From our result for the two-dimensional case we guess that it is a good strategy to 
choose one observable in the direction of r, e.g. B\ = t. For simplicity, we assume 
that r is non-degenerate, its spectral projections thus being the one-dimensional 
operators e/j. If r is degenerate, we slightly change it to some non-degenerate f 
and re-insert r in the very end of the computation. Following the spirit of Wootters 
and Fields [1], we seek to choose the other observables Bp {(3 > 2) such that all 
eigenbases are mutually unbiased. It is not known whether for arbitrary dimensions 
d such operators exist. However, the averaging method as developed above provides 
a strategy that comes close to this idea and is realizable in any dimension. It requires 
just one other observable, B 2 , satisfying 

Tr(P la P 2a = 3 Va6 Sp(£i) and a' G Sp(£ 2 ). (6.30) 
d 

This may also be written as 

P2a,n = \ Va G Sp(£? 2 ) (6.31) 

and implies (r|P 2a ) = 2/c?Va G Sp(P 2 ). To these two observables we apply the 
strategy improving mechanism (6.22). //however there exists a large enough family 
of mutually unbiased bases, as in the explicit example given in [1], then all compo- 
nents P/3a,u of P/3a coincide up to phase factors, and we expect the strategy based 
on these to be equivalent to the one we will now analyze. (In the two-dimensional 
case, this corresponds to the fact that we can either measure in two fixed orthogonal 
directions - as has explicitly been worked out in the preceding section -, or alterna- 
tively in all directions orthogonal to u. In this case the averaging method does not 
lead to anything new). 

So let us start with B\ = r and B 2 satisfying (6.30). We leave n x and n 2 
unspecified for the moment. With (6.8), the contributions to (6.22) for a = are 
as follows: 

mWBi) = m E f npt J = or-*" V/ ' J ( 6 - 32 ) 
niQmj(Bi) = «i E = for/ ^ J ( 6 - 33 ) 
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n 2 Qu,jj(B2) = n 2 E = f V/ ' J ^ 



aGSp(B 2 ) 

~(t\P 2u ) 2 



n 2 Qu,u(B 2 ) = n 2 ]T TT^TV = 7T for 7 ^ J " ( 6 - 35 ) 



aGSp(B 2 ) 

Adding (6.32) + (6.34) and (6.33) + (6.35) gives all non-zero components of Mq(0). 
The only nonzero components of the operators R and S as introduced in (6.23)- 
(6.25) are thus given by 

Ru = + ^ V/,J (6.36) 

2 7// 2 

Sij,ij = y for 7^ J. (6.37) 
Using (6.27), our final result for the volume oriented approach reads 

MM 1 = \ (^f ' C^Y"" det(r-) . (6.38) 



d V 2 / V 2 / 

For given n = ni +n 2 , the best of all these strategies is characterized by n\n% = max, 
which leads to 

ni = ZTT and n2 = JT~r (6 - 39) 

hence 

(detM ') max = d d2 - d - 1 [^-^j det(r- 1 ) . (6.40) 

If d = 2, this coincides with the value (5.8) for the best two-dimensional (volume 
oriented) strategy. However, as we shall see, in higher dimensions there are states 
r for which one can do better. Analogously, using (6.28), we find for the distance 
oriented approach 

Tr(M ' - 1 ) = - (l - Tr(r 2 )) + 2d ( d ~ l ) . (6 . 4 1) 
rii v ' n 2 

For given n = rii + n 2 , the best of all these strategies are characterized by 

n n 
rii = and n 2 = , (6.42) 

1 / d(d-l) i , / l-Tr(r2) 

1 + V 1-Tr(r2) i + V d(d-l) 
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hence 

TtyVOnun = l (v/l-Tr^ + ^rf-l)) 2 . (6.43) 

If d = 2, this coincides with the value (5.16) for the best two-dimensional (distance 
oriented) strategy. Whether one can do better in higher dimensions is an open 
question. 

Summarizing, the strategies specified by (6.39) and (6.42) are in a sense the natu- 
ral generalizations from the two-dimensional case, their effectiveness being quantified 
by (6.40) and (6.43). 

Strategy 2: Using matrix units 

We will now - for even dimensions - construct a different strategy that sometimes 
works better in the volume oriented approach. From the two-dimensional situation 
we have learned the following: The uncertainties (in both the volume and the dis- 
tance oriented approach) are smaller when the unknown state is less mixed. As in 
the strategy constructed above, we choose one observable, B 1 , coinciding with r. 
The other observables should give as much new information as possible, therefore 
should be sufficiently independent of r. They are maximally independent if they 
are mutually unbiased. However, then the uncertainties tend to be large. Therefore 
two effects are competing, and we have observed that in two dimensions the inde- 
pendency is the dominating effect. In higher dimensions, a convenient basis of B is 
given by the matrix units (6.1), constructed out of an eigenbasis of r. Since these 
operators are not positive (not even hermitian) and therefore do not correspond to 
observables, we resort to the d(d — 1) projections defined by (I < J) 

Pfj = \ {e H ± e LJ ± e n + ejj) . (6.44) 

Our goal is to construct the rest of our observables out of these operators. As before, 
we understand that the average procedure (6.22) has been performed. In effect this 
just means to take into account only the components of M.q IJKL (0) relevant for 
Ru and Sjjjj as defined in (6.23)-(6.25). Any P^ L (K < L) appearing as spectral 
projection of an observable Bp gives the contributions 

p kl,ii p kl,jj np (S IK + 5 IL )(5 JK + 5 JL ) , 

na — rTTTT r — = ~r vi , J (6.45) 

p (t\P± l ) 4 t kk + t ll 

n^-^^r = ^ 5lK5jL + 6jk6iL for/^J (6.46) 
P {t\P£ l ) 4 t kk + t ll T K } 
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to Rjj and Sjjjj, respectively. These expressions have to be summed up for all 
projectors involved. The contributions from B\ are identical with (6.32)-(6.33). 

Let us now show how the projections (6.44) may be used to define suitable ob- 
servables. The idea is to group these operators into d—1 subfamilies, each containing 
d elements, in order to construct d—1 observables in addition to B 1 . We will restrict 
ourselves to even d and define B 2 to have the spectral projections (the eigenvalues 
being irrelevant as long as each observable is non-degenerate) 

^12 > ^12 > -^34 5 ^34) • • • P{d-l)di P(d-l)d- (6.47) 

This may be abbreviated in terms of the partition 

B 2 < — ► (1, 2)(3, 4) ... (d - 1, d) (6.48) 

of (1,2, ... ,d). The remaining observables are obtained by appropriately permuting 
certain numbers in the above partition, such that any pair never occurs twice. This 
is possible in any even dimension and can best be explained in an example: For 
illustration we choose d = 6 and define 



B 2 <- 


- (1,2)(3,4)(5,6) 


B 3 <- 


- (1,3)(2,5)(4,6) 


B, <- 


- (1,5)(3,6)(2,4) 


B 5 <- 


- (1,6)(5,4)(3,2) 


B Q <- 


- (1,4)(6,2)(3,5) 



The underlying general procedure is the following: One number in every pair is 
moving to the right, one to the left as long as it is possible, then it is reflected. In 
this way every number corresponds to a line, and every line crosses every other line 
exactly once. For d > 4 there are other possible permutation schemes (which should 
all be taken into account when the best of these strategies is to be determined). A 
strategy is fixed by giving any observable Bp ((5 — 1, . . . d) a weight np, the number 
of measurements reserved for the family Bp(ip), such that J2p=i n p — n - 

In order to write down the operators R and S for this type of strategy, we note 
that, for given K,L (K ^ L), either the pair P^ L or the pair P^ K occurs in some 
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Bp ({3 > 2). Let us denote this (3 by f3(K, L). Using this notation, we sum up (6.45) 
and (6.32) for R, (6.46) and (6.33) for S, to obtain 

Ru = faf^+E 7^ , (6-50) 

W = , for/^J. (6.51) 

The explicit evaluation of (6.27) and (6.28) for a general strategy of this type and 
comparison with our previous results (6.40) and (6.43) is not an easy task. We will 
therefore confine ourselves to a family of examples: Let d > 4, r n = r 22 = a/2 
and r 33 = ... = r dd = (1 — a)/(d — 2), and set rip = n' for all (3 = 2 . . . d (i.e. 
n p(i,J) = n> f° r an I 7^ J)- For small a, the combination (rn + r^) -1 is large. 
This blows up the determinant of Ai: We find i? n = i? 22 = ri\/a + 0(1) and 
R12 = R21 = 5*1212 = S2121 = n' /(2a) +0(1), whereas all other components are finite 
for a — > 0. The application of (6.27) to (6.50)-(6.51) exhibits the behaviour 

detM ' ~ 0(a~ 4 ) for small a. (6.52) 

This may be compared with (6.40) which - for the same r - diverges only as 0(a~ 2 ). 
Hence, for given even dimension d > 4, there is always an unknown state r (defined 
by sufficiently small a) such that a strategy of type 2 is better than strategy 1 in 
the volume oriented approach. For the distance oriented approach, there is no such 
difference in the scaling behaviour for a — > 0. 

For a — 2/d, we obtain the tracial state r = d^ 1 1, i.e. t u = d^ 1 for all I. In 
this case we can be more explicit, and we obtain 

/ r\ d 2 -i 

detM' = I ([2n 1 +n'(d-2)]n' d ) d ~ 1 (6.53) 

^'"') - w-vi ^Jv-w +h)- (6 - 54) 

Interestingly, if d > 4, both expressions become optimized if n\ = 0, i.e. n' = 
n/ (d— 1). Hence, the best values for this class of strategies for the tracial state are 
given by 



(detMUx = | 7 ] [(d-2)[ — ) j (6.55) 
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(6.56) 



The volume oriented strategy (6.55) gets beaten by (6.40), because - for the tracial 
state - (6.40)>(6.55) for all d. Asymptotically for large d, (6.40) exceeds (6.55) by a 
factor of leading order 2 d . Similarly, the distance oriented strategy (6.56) is worse 
than (6.43), because, for large d, (6.56) is twice as large as (6.43) for the tracial 
state. 

It is easy to show that the last feature remains true for general states if d > 6: 
Using the estimates 

together with (6.28), we find 

Tr^'- 1 ) > 4 ( d ~ 1 ) 2 . (6.59) 
n 

From this it follows that also for general states in d > 6 our strategy of type 2 
cannot beat (6.43). 

Summarizing, for even dimensions > 4, there are states r for which the strategy 
(6.40) based on mutually unbiased observables is not optimal when evaluated in the 
volume oriented approach. On the other hand, in the distance oriented approach, 
we cannot offer a strategy better than (6.43). 



Remarks on infinite dimensions 

The results achieved in this paper suggest that the number of measurements neces- 
sary in order to arrive at an estimate of the unknown state r with an uncertainty 
of the order e increases like d 2 with increasing dimension d of the Hilbert space 7i. 
This may be seen in both approaches we discussed: Identifying e 2 with ^(M.^ 1 ) in 
the distance oriented approach, (6.43) implies e 2 ~ 2<i 2 /n, hence n ~ (d/e) 2 . The 
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analogous situation for the volume oriented approach is roughly modeled by iden- 
tifying (detAi)^ 1 / 2 with the volume of a sphere of radius e in (d 2 — 1) -dimensional 
Euclidean space. Using Stirling's formula, the latter is for large d given by 

V, - rk = (^FV) (6-60) 



d 2 - 1)tt 



With (6.40) - the strategy based on mutually unbiased observables - and fixed 
det(r _1 ), this gives n ~ (d/e) 2 as in the distance oriented approach. This behaviour 
is confirmed by the strategies of type 2 (using matrix units) which were found to be 
better for certain states. Since we need an additional number of measurements to 
get a first rough estimate of r, the formula n ~ (d/e) 2 is to be understood as the 
leading asymptotic behaviour of n as e approaches 0. 

If this behaviour is true for the best strategies possible, it has dramatic con- 
sequences for the infinite dimensional case: At first glance, it would altogether be 
impossible to determine r with some (given) uncertainty e. However, in infinite 
dimensions we may decompose the Hilbert space as H = Pdi'H) ®Pd{'H)~ L , where Pd 
is some finite (d-) dimensional hermitean projection, and measure Pd in a number of 
copies of our quantum system. Starting with d — 1, we choose a one-dimensional 
hermitean projection P 1 . Whenever the measurement outcome is 0, i.e. corresponds 
to P d (7i)- L , we redefine <i new = d + 1, choose some new decomposition such that 
Pd ne „ > Pd, and proceed analogously. During this process, the probability for to 
occur in a further measurement, given by 1 — Tr(rPd), drops down to zero as d in- 
creases. In other words, the measurement data become increasingly consistent with 
the expectation that r is a density matrix in Pd{TL). If p cx is the expected state, the 
uncertainty e about r is given by e 2 w Tr((r — p cx ) 2 )- In terms of an appropriate 
block matrix notation we have 

Moo) ~(3£). <-) 

so that Tr((r-p cx ) 2 ) = Tr((r (i -p fi ) 2 ) + 2 Tr(z/^) + Tr(r^). Forgiven e > 0, there is 
a (finite) dimension d e & and a (finite) number n of measurements necessary to make 
sure that 2 Tr(i/^ ff i/ deff ) +Tt(t^ ) < el . The numbers d cS and n will depend on r and 
on the sequence of projections P±,P2, . . . chosen. Once having reached this point, 
we proceed as if r acts entirely in the subspace Pd eS (Ji). (Technically, we measure 
observables of the type A © (1 — Pd ctt ), where A acts in Pd eH (H), and ignore further 
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outcomes that belong to the remaining infinite dimensional subspace Pd cf[ {H,) ± ). 
Next, given ei, we need a further number n\ ~ (rfcff/ei) 2 of measurements to arrive 
at a final estimate p fin in P deS (H) such that Tr((r doff — p& n ) 2 )<t\. Hence, after 
n = n + rii measurements, the total uncertainty is of the order e = (e^ + e 2 ) 1 / 2 . 
This procedure enables one to determine an unknown state to any desired degree of 
security even if it lives in an infinite dimensional Hilbert space. (This is of course 
not an optimal strategy. A more efficient method is e.g. to combine the two parts of 
the procedure and to measure observables of the type A© (1 — P d ) from the outset). 

The apparent contradiction of this result with the behaviour n ~ (d/e) 2 in the 
case of large but finite dimension d is clarified by noting that the number no may be 
very large: Suppose some sequence of projections Pd+i = Pd + \^d+i)(^d+i\ has been 
fixed (ej denoting an orthonormal basis of 7i, the starting point being Pi = |ei)(ei|), 
and suppose that r = |e^)(e£)| for some D (that may be very large). In this case it 
takes D measurements until a non-zero outcome is possible. Similar scenarios are 
possible for any r: Given an arbitrary number N, then (with some portion of bad 
luck) it is always possible to adjust the sequence of projections such that no > N. 
Hence, there exists no general upper bound for no (and thus for n). This feature is 
not present in the finite dimensional case. The behaviour n — > oo as obtained by 
letting d — > oo in the formula n ~ (d/e) 2 must be understood in this sense. 

A problem still persists with our approach. It stems from the fact that we have 
invoked a Gaussian approximation. For finite dimension d we infer from (2.15) and 
(2.22) that this approximation is reliable if e || r _1 || <S 1. In other words: if e is 
chosen too large, our formalism will fail to reproduce the unknown state with the 
promised accuracy. As a consequence, we must have n ^> d 2 || r _1 ||, which means 
that a smaller number of measurements will not lead to a reasonable result. This 
introduces an additional dependence on the dimension into the state determination 
problem: Since || r _1 ||> d, we have e <C d^ 1 and n ^> d 3 . However, in large 
dimensions, typical density matrices tend to have even larger || r _1 1|. In the infinite 
dimensional case, || r _1 || is no longer finite. Even when reducing the problem to 
an effectively finite dimensional one, as sketched above, we can expect the density 
matrix T deS to have a very large (if not infinite) value of || r^. ||. This in turn 
requires the choice of a correspondingly small e and blows up n. A partial cure of 
this dilemma is to modify the determination of Pd cfi (H) so as to statistically test 
any redefinition d new — d + 1 whether a large enough portion of r is gained, and 
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undo it otherwise. Thus, the small eigenvalues of r may be kept in Pd cH (H) ± , and 
only the large ones are taken into account. In effect, we expect such a procedure to 
reduce the number of measurements necessary. 
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